A Sample-Efficient Algorithm for Episodic Finite-Horizon MDP with Constraints

نویسندگان

چکیده

Constrained Markov decision processes (CMDPs) formalize sequential decision-making problems whose objective is to minimize a cost function while satisfying constraints on various functions. In this paper, we consider the setting of episodic fixed-horizon CMDPs. We propose an online algorithm which leverages linear programming formulation repeated optimistic planning for finite-horizon CMDP provide probably approximately correctness (PAC) guarantee number episodes needed ensure near optimal policy, i.e., with resulting value close that and within low tolerance, high probability. The shown have dependence sizes state action spaces quadratic time horizon upper bound possible successor states state-action pair. Therefore, if much smaller than size space, becomes in horizon.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning

Recently, there has been significant progress in understanding reinforcement learning in discounted infinite-horizon Markov decision processes (MDPs) by deriving tight sample complexity bounds. However, in many real-world applications, an interactive learning agent operates for a fixed or bounded period of time, for example tutoring students for exams or handling customer service requests. Such...

متن کامل

Finite horizon robust model predictive control with terminal cost constraints

In this paper, we develop a finite horizon model predictive control algorithm which is robust to modelling uncertainties. A moving average system matrix is constructed to capture modelling uncertainties and facilitate the future output prediction. The paper is mainly focused on the step tracking problem. Using linear matrix inequality techniques, the design is converted into a semi-definite opt...

متن کامل

Online Learning with Expert Advice and Finite-Horizon Constraints

In this paper, we study a sequential decision making problem. The objective is to maximize the average reward accumulated over time subject to temporal cost constraints. The novelty of our setup is that the rewards and constraints are controlled by an adverse opponent. To solve our problem in a practical way, we propose an expert algorithm that guarantees both a vanishing regret and a sublinear...

متن کامل

Finite-Horizon Markov Decision Processes with State Constraints

Markov Decision Processes (MDPs) have been used to formulate many decision-making problems in science and engineering. The objective is to synthesize the best decision (action selection) policies to maximize expected rewards (minimize costs) in a given stochastic dynamical environment. In many practical scenarios (multi-agent systems, telecommunication, queuing, etc.), the decision-making probl...

متن کامل

A memory-efficient algorithm for multiple sequence alignment with constraints

MOTIVATION Recently, the concept of the constrained sequence alignment was proposed to incorporate the knowledge of biologists about structures/functionalities/consensuses of their datasets into sequence alignment such that the user-specified residues/nucleotides are aligned together in the computed alignment. The currently developed programs use the so-called progressive approach to efficientl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2021

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v35i9.16979